Skip to content

Add outdated package analysis support#41

Merged
TongWu merged 13 commits intomainfrom
dev
Sep 17, 2025
Merged

Add outdated package analysis support#41
TongWu merged 13 commits intomainfrom
dev

Conversation

@TongWu
Copy link
Owner

@TongWu TongWu commented Sep 17, 2025

Summary by CodeRabbit

  • New Features
    • Added a command-line report that analyzes your requirements, checks PyPI, and suggests vulnerability-free upgrade targets (latest or second-latest major).
    • Outputs a CSV with package name, current version, upgrade availability/instruction, and last-active dates for the current major and the package.
    • Supports options to select the requirements file, set the output path, and limit the number of packages processed.
  • Style
    • Formatting updates only; no functional changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (2)
  • MonthlyReport/2025-09/MonthlyReport-202509-17-1246.xlsx is excluded by !**/*.xlsx
  • WeeklyReport/2025-09-15/WeeklyReport_20250917_123253.csv is excluded by !**/*.csv

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Introduces a new OutdatedPackageAnalysis script and supporting utilities to analyze Python package upgrades, community activity, and vulnerability-free version suggestions. Adds CommunityActivityUtils for activity timelines and a new async helper in VersionSuggester. Adjusts formatting in GenerateReport without logic changes.

Changes

Cohort / File(s) Summary
Reporting & Analysis Scripts
GenerateReport.py, OutdatedPackageAnalysis.py
GenerateReport.py: formatting-only line breaks. OutdatedPackageAnalysis.py: new CLI/async script to read requirements, query PyPI, suggest safe upgrades per major via utils, fetch community activity dates, and write a CSV report with structured rows via PackageReport.
Community Activity Utilities
utils/CommunityActivityUtils.py
New module to compute package activity: parse inputs, fetch PyPI metadata, resolve GitHub repo, get last activity (GraphQL/REST with caching, auth, backoff), and derive dates for current major and overall package activity. Public loaders and builders included.
Version Suggestion Utilities
utils/VersionSuggester.py
New async API find_latest_safe_version_for_major(...): filters versions by target major, avoids improper downgrades, queries OSV concurrently (semaphore), returns first vulnerability-free candidate or None; handles invalid versions safely.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User/CLI
  participant OA as OutdatedPackageAnalysis
  participant RF as Requirements File
  participant PY as PyPI API
  participant VS as VersionSuggester (OSV)
  participant CA as CommunityActivityUtils (GitHub)

  U->>OA: main(args)
  OA->>RF: read package==version list
  loop per package (async)
    OA->>PY: fetch releases/metadata
    OA->>VS: find_latest_safe_version_for_major(pkg, curr, all, major)
    VS->>VS: filter/sort candidates
    VS->>VS: query OSV (semaphore-limited)
    OA->>CA: get_activity_dates(pkg, curr, pypi_info)
    CA->>PY: (optional) use pre-fetched PyPI info
    CA->>CA: resolve GitHub repo
    CA->>GitHub: GraphQL/REST (cached, token)
    CA-->>OA: last-active dates
  end
  OA-->>U: CSV report path
Loading
sequenceDiagram
  autonumber
  participant CA as CommunityActivityUtils
  participant PY as PyPI API
  participant GH as GitHub API
  CA->>PY: Get package info
  CA->>CA: Extract repo URL
  alt GraphQL available
    CA->>GH: GraphQL last activity
  else REST fallback
    CA->>GH: REST last activity (ETag/If-Modified-Since)
  end
  CA->>CA: Compute latest dates (current major, overall)
  CA-->>Caller: ("YYYY-MM-DD" | "Unknown", "YYYY-MM-DD" | "Unknown")
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Push from dev to main #13 — Both modify utils/VersionSuggester to add new version-suggestion helpers, indicating overlapping logic for upgrade recommendation.

Poem

I twitch my nose at versions old,
Hop-hop through majors, brave and bold.
I sniff out bugs the OSV way,
And mark the dates of GitHub play.
A CSV trail where carrots gleam—
Upgrades found! A tidy stream. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.97% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add outdated package analysis support" succinctly and accurately describes the primary change set: adding OutdatedPackageAnalysis.py and supporting utilities (utils/CommunityActivityUtils.py and a VersionSuggester helper) to analyze and report outdated Python packages. It is a single clear sentence that conveys the main intent and is directly related to the changeset rather than being vague or off-topic.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

… sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@TongWu TongWu temporarily deployed to WT_WeeklyTriggerEnv September 17, 2025 04:34 — with GitHub Actions Inactive
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
utils/VersionSuggester.py (1)

45-76: Guard against accidental use of legacy suggest_upgrade_version.

Per prior learning, this was deprecated in favor of suggest_safe_minor_upgrade. Ensure it’s not referenced.

Run:

#!/bin/bash
rg -nP '\bsuggest_upgrade_version\s*\(' -g '!**/dist/**' -g '!**/build/**'
utils/CommunityActivityUtils.py (2)

320-349: Re: prior “Incomplete URL substring sanitization” alert — normalization now anchors on hostname.

_normalize_github_url uses urlparse.hostname and explicit allowlist for github.com/www.github.com, mitigating substring tricks.


414-419: Incorrect GitHub GraphQL endpoint (uses github.com instead of api.github.com).

Current code calls https://github.com/graphql which will fail; should be https://api.github.com/graphql.

Apply this diff:

-    gql_url = f"{_GITHUB_API.replace('api.', '')}/graphql"
+    gql_url = f"{_GITHUB_API}/graphql"
🧹 Nitpick comments (9)
utils/VersionSuggester.py (3)

195-207: Narrow exception handling around OSV checks.

Catching Exception hides real failures and trips BLE001. Limit to aiohttp/timeout errors.

Apply this diff:

-    async with aiohttp.ClientSession() as session:
-        sem = asyncio.Semaphore(5)
-        for _, ver_str in candidates:
-            try:
-                _, status, _ = await fetch_osv(session, pkg, ver_str, sem)
-            except Exception as exc:  # pragma: no cover - network safety
-                logger.warning(
-                    f"Failed to verify vulnerabilities for {pkg}=={ver_str}: {exc}"
-                )
-                continue
+    async with aiohttp.ClientSession() as session:
+        sem = asyncio.Semaphore(5)
+        for _, ver_str in candidates:
+            try:
+                _, status, _ = await fetch_osv(session, pkg, ver_str, sem)
+            except (aiohttp.ClientError, asyncio.TimeoutError) as exc:  # pragma: no cover
+                logger.warning("OSV check failed for %s==%s: %s", pkg, ver_str, exc)
+                continue

136-141: Allow caller-provided session/semaphore to avoid per-call client creation.

This reduces overhead when scanning many packages (used by OutdatedPackageAnalysis).

Apply this diff:

-async def find_latest_safe_version_for_major(
+async def find_latest_safe_version_for_major(
     pkg: str,
     current_version: str,
     all_versions: list[str],
     target_major: int,
-) -> str | None:
+    session: aiohttp.ClientSession | None = None,
+    sem: asyncio.Semaphore | None = None,
+) -> str | None:
@@
-    async with aiohttp.ClientSession() as session:
-        sem = asyncio.Semaphore(5)
-        for _, ver_str in candidates:
+    own_session = session is None
+    if sem is None:
+        sem = asyncio.Semaphore(5)
+    if own_session:
+        session = aiohttp.ClientSession()
+    try:
+        for _, ver_str in candidates:
             try:
                 _, status, _ = await fetch_osv(session, pkg, ver_str, sem)
             except (aiohttp.ClientError, asyncio.TimeoutError) as exc:  # pragma: no cover - network safety
                 logger.warning(
                     f"Failed to verify vulnerabilities for {pkg}=={ver_str}: {exc}"
                 )
                 continue
@@
-    return None
+    finally:
+        if own_session and session:
+            await session.close()
+    return None

Also applies to: 195-209


110-127: Optional: check candidates concurrently (bounded) for faster results.

You can fire a small batch (e.g., top 8) with asyncio.as_completed under the same semaphore; stop on first “No”.

I can provide a concrete patch if you’d like this optimization.

utils/CommunityActivityUtils.py (4)

61-63: Narrow cache init exception and log why cache is ignored.

Avoid bare Exception and silent fallback.

Apply this diff:

-try:
-    _ETAG_CACHE = json.loads(_CACHE_PATH.read_text(encoding="utf-8"))
-except Exception:
-    _ETAG_CACHE = {}  # {key: {"etag": "...", "last_modified": "...", "payload": {...}, "fetched_at": "..."}}
+try:
+    _ETAG_CACHE = json.loads(_CACHE_PATH.read_text(encoding="utf-8"))
+except (OSError, json.JSONDecodeError) as exc:
+    logger.debug("Initializing GitHub cache failed: %s", exc)
+    _ETAG_CACHE = {}  # {key: {"etag": "...", "last_modified": "...", "payload": {...}, "fetched_at": "..."}}

83-86: Do not swallow cache write failures.

Log at debug to aid diagnosability.

Apply this diff:

-    try:
-        _CACHE_PATH.write_text(json.dumps(_ETAG_CACHE), encoding="utf-8")
-    except Exception:
-        pass
+    try:
+        _CACHE_PATH.write_text(json.dumps(_ETAG_CACHE), encoding="utf-8")
+    except OSError as exc:
+        logger.debug("Writing GitHub cache failed: %s", exc)

285-295: Overbroad try/except in host check.

urlparse won’t raise here; remove blanket except to avoid masking bugs.

Apply this diff:

-def _is_github_host(url: str) -> bool:
-    try:
-        parsed = urlparse(url)
-        hostname = parsed.hostname
-        if not hostname:
-            return False
-        hostname = hostname.lower()
-        return hostname == "github.com" or hostname.endswith(".github.com")
-    except Exception:
-        return False
+def _is_github_host(url: str) -> bool:
+    parsed = urlparse(url)
+    hostname = (parsed.hostname or "").lower()
+    return hostname == "github.com" or hostname.endswith(".github.com")

450-457: Minor: rename unused loop variable to underscore to satisfy linters.

No behavior change.

Apply this diff:

-    for attempt in range(max_retries):
+    for _attempt in range(max_retries):
@@
-    for attempt in range(max_retries):
+    for _attempt in range(max_retries):

Also applies to: 513-520

OutdatedPackageAnalysis.py (2)

268-281: Process packages concurrently with a bounded semaphore.

Significantly reduces wall-clock time while respecting external rate limits.

Apply this diff:

-async def _generate_reports(
-    packages: list[tuple[str, str]],
-) -> list[PackageReport]:
-    """Process packages sequentially and collect report rows."""
-
-    results: list[PackageReport] = []
-    total = len(packages)
-    for idx, (name, version_str) in enumerate(packages, start=1):
-        logger.info("[%d/%d] Evaluating %s==%s", idx, total, name, version_str)
-        report = await _process_package(name, version_str)
-        if report:
-            results.append(report)
-    return results
+async def _generate_reports(
+    packages: list[tuple[str, str]],
+) -> list[PackageReport]:
+    """Process packages concurrently with a small cap."""
+    sem = asyncio.Semaphore(5)
+    total = len(packages)
+
+    async def run_one(idx: int, name: str, version_str: str) -> PackageReport | None:
+        async with sem:
+            logger.info("[%d/%d] Evaluating %s==%s", idx, total, name, version_str)
+            return await _process_package(name, version_str)
+
+    tasks = [run_one(i, n, v) for i, (n, v) in enumerate(packages, start=1)]
+    results = await asyncio.gather(*tasks)
+    return [r for r in results if r]

286-294: Clarify CSV header label.

Header says “Is Major/Second Major Version” but code may emit “Current Major”. Make the header explicit.

Apply this diff:

-        "Is Major/Second Major Version",
+        "Target Major (Latest/Second/Current)",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6cd24df and 8409746.

⛔ Files ignored due to path filters (2)
  • MonthlyReport/2025-09/MonthlyReport-202509-16-1749.xlsx is excluded by !**/*.xlsx
  • WeeklyReport/2025-09-15/WeeklyReport_20250916_173749.csv is excluded by !**/*.csv
📒 Files selected for processing (4)
  • GenerateReport.py (1 hunks)
  • OutdatedPackageAnalysis.py (1 hunks)
  • utils/CommunityActivityUtils.py (1 hunks)
  • utils/VersionSuggester.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-24T03:17:27.150Z
Learnt from: TongWu
PR: TongWu/PythonPackageManager#14
File: utils/VersionSuggester.py:136-137
Timestamp: 2025-06-24T03:17:27.150Z
Learning: In utils/VersionSuggester.py, the suggest_upgrade_version function has been intentionally disabled as it's outdated. The preferred approach is to use suggest_safe_minor_upgrade function instead for version suggestions.

Applied to files:

  • utils/VersionSuggester.py
🧬 Code graph analysis (3)
utils/CommunityActivityUtils.py (1)
utils/PyPiUtils.py (1)
  • GetPyPiInfo (27-44)
utils/VersionSuggester.py (1)
utils/VulnChecker.py (1)
  • fetch_osv (30-102)
OutdatedPackageAnalysis.py (3)
utils/CommunityActivityUtils.py (1)
  • get_activity_dates (113-140)
utils/PyPiUtils.py (1)
  • GetPyPiInfo (27-44)
utils/VersionSuggester.py (1)
  • find_latest_safe_version_for_major (136-209)
🪛 Ruff (0.12.2)
utils/CommunityActivityUtils.py

62-62: Do not catch blind exception: Exception

(BLE001)


85-86: try-except-pass detected, consider logging the exception

(S110)


85-85: Do not catch blind exception: Exception

(BLE001)


293-293: Do not catch blind exception: Exception

(BLE001)


450-450: Loop control variable attempt not used within loop body

Rename unused attempt to _attempt

(B007)


455-455: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


474-474: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


478-478: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


513-513: Loop control variable attempt not used within loop body

Rename unused attempt to _attempt

(B007)


518-518: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


532-532: Do not catch blind exception: Exception

(BLE001)

utils/VersionSuggester.py

200-200: Do not catch blind exception: Exception

(BLE001)

OutdatedPackageAnalysis.py

1-1: Shebang is present but file is not executable

(EXE001)


312-312: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
GenerateReport.py (1)

320-322: Formatting-only change — LGTM.

No logic impact detected on monthly_df columns.

Comment on lines +211 to +221
async def _process_package(
package: str,
current_version_str: str,
) -> PackageReport | None:
"""Inspect a single package and generate a report entry when outdated."""

info = GetPyPiInfo(package)
if not info:
logger.warning("PyPI metadata unavailable for %s; skipping", package)
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Blocking requests inside async path — move to threads.

GetPyPiInfo and get_activity_dates use requests; calling them directly in async blocks the event loop.

Apply this diff:

-    info = GetPyPiInfo(package)
+    info = await asyncio.to_thread(GetPyPiInfo, package)
@@
-    last_active_current_major, last_active_package = get_activity_dates(
-        package, current_version_str, info
-    )
+    last_active_current_major, last_active_package = await asyncio.to_thread(
+        get_activity_dates, package, current_version_str, info
+    )

Also applies to: 253-255

🤖 Prompt for AI Agents
In OutdatedPackageAnalysis.py around lines 211-221 (and similarly at 253-255),
the synchronous HTTP calls (GetPyPiInfo and get_activity_dates) are being
invoked directly inside an async function which blocks the event loop; change
those calls to run on a thread by using asyncio.to_thread (or
loop.run_in_executor) and await the result (e.g., info = await
asyncio.to_thread(GetPyPiInfo, package) and activity = await
asyncio.to_thread(get_activity_dates, ...)), update imports if needed (import
asyncio), and ensure error handling/logging remains the same after awaiting the
threaded call.

@TongWu TongWu merged commit 589ee0b into main Sep 17, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant